This study looked at body temperature distributions of diurnal desert lizards across three continents (North America, Africa, and Australia). Data was gathered over several decades and later recycled for use in this study. Body temperature distributions were expected to exhibit left-skewness. This study assessed whether skewness would differ between phylogeny, location, body size, or median body temperature, but no significant differences were found.
Density plots were constructed for each African Kalahari desert lizard species and standardized so the modal temperature was set to zero. This allowed for comparison of skewness of body temperature between species. This also allowed visualization of left skewness with supporting data showing that seven species exhibited significant left skewness.
Supporting data included mean body temperature, median body temperature, mean absolute deviation, minimum and maximum body temperatures, number of samples per group, and the D’Agostino skewness coefficient and the related p value. Minimum sample size for inclusion was N = 23. Calculations were run separately for each season as well as continent/latitudinal area. Data for Australia and Africa was able to be pooled, but North American data spanned 14 degrees of latitude and needed to be separated into two regions because this can bias skewness. It was noted that the D’Agostino skewness coefficient is very sensitive to sample size and did not provide significant results for any species with N < 50. However, these were included due to the designation of N = 23 as the minimum sample size. One-tailed tests were used due to the hypothesis that all samples would be left-skewed.
Phylogenetic analysis was not done because they were using older data which did not include enough deserts/species with overlapping clades to be appropriate for phylogenetic analysis. Essentially, the preliminary requirements were not met for a full factorial ANOVA.
Install {moments}.
Load the following packages:
> library(curl)
## Using libcurl 7.64.1 with LibreSSL/2.8.3
> library(ggplot2)
> library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ tibble 3.1.5 ✓ dplyr 1.0.7
## ✓ tidyr 1.1.4 ✓ stringr 1.4.0
## ✓ readr 2.0.2 ✓ forcats 0.5.1
## ✓ purrr 0.3.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
## x readr::parse_date() masks curl::parse_date()
> library(dplyr)
> library(moments)
Density plots of Tb of 10 Kalahari lizard species in summer: distributions are centred on modal temperature of each species. Left skewness is visually evident for many species and significant in seven (Table S1)
> f <- curl("https://raw.githubusercontent.com/maekell98/nkelley-data-replication-assignment/main/doi_10-5/KalahariTbData.csv")
> kl <- read.csv(f, header = TRUE, sep = ",", stringsAsFactors = FALSE)
> head(kl)
## area number latitude species Tb time date family
## 1 L 15065 -26.38 Agama aculeata 35.2 12:25 11/28/69 Agamdidae
## 2 L 15087 -26.38 Agama aculeata 38.0 16:28 11/28/69 Agamdidae
## 3 K 15092 -25.75 Agama aculeata 37.2 10:57 11/29/69 Agamdidae
## 4 L 15123 -26.38 Agama aculeata 35.6 17:40 11/28/69 Agamdidae
## 5 K 15141 -25.75 Agama aculeata 37.0 12:50 11/30/69 Agamdidae
## 6 K 15144 -25.75 Agama aculeata 38.0 15:30 11/30/69 Agamdidae
Note: Southern latitude summer = December through February. Therefore, we must subset the data by date to include only data collected in the summer months.
“We arbitrarily set a minimum sample size of N = 23 Tb for inclusion.” Since Trachylepis variegata does not have this many samples, it will be excluded from further analysis at the end of the the following chunk.
> #in order to subset by date, dates must be coerced into a format recognized by r:
> #date formatting: https://stackoverflow.com/questions/38146160/how-to-convert-dd-mm-yy-to-yyyy-mm-dd-in-r
> kl$date <- as.Date(kl$date, "%m/%d/%Y")
>
> #kls = kalahari lizards summer subset
> #date subsetting: https://www.statology.org/subset-by-date-range-in-r/
> kld <- kl[kl$date >= "0069-12-02" & kl$date <= "0070-02-27", ]
>
> #not enough data for species Trachylepis spilogaster, deleting for further analyis
> kls <- subset(kld, species!="Trachylepis variegata")
> head(kls)
## area number latitude species Tb time date family
## 7 <NA> 15166 -27.07 Agama aculeata 38.4 14:18 0069-12-02 Agamdidae
## 9 <NA> 15192 -27.07 Agama aculeata 40.4 14:33 0069-12-02 Agamdidae
## 10 <NA> 15194 -27.07 Agama aculeata 36.2 14:54 0069-12-02 Agamdidae
## 11 <NA> 15233 -26.85 Agama aculeata 37.6 11:00 0069-12-04 Agamdidae
## 12 <NA> 15241 -26.80 Agama aculeata 37.8 8:50 0069-12-05 Agamdidae
## 13 <NA> 15242 -26.80 Agama aculeata 34.8 8:55 0069-12-05 Agamdidae
In order to obtain density plots by species, we first subset by species. Each subset is mutated so they are standardized for comparison with each other. This involves subtracting the body temperature mode from each body temperature measurement which essentially sets the mode for each sample to 0 without changing the relationship between the individual points. I was having a really hard time at first figuring our how they centered their distributions around the modal temperatures of each species. Ultimately, this was my solution. My plot differs somewhat from theirs, so this might be a source of that discrepancy. However, I don’t see a reason why my solution shouldn’t work.
> #used unique to quickly get a list of species in the kl data set
> unique(kld$species)
## [1] "Agama aculeata" "Heliobolus lugubris"
## [3] "Meroles squamulosus" "Meroles suborbitalis"
## [5] "Nucras tessellata" "Pedioplanis namaquensis"
## [7] "Pedioplanis_lineoocellata" "Trachylepis occidentalis"
## [9] "Trachylepis sparsa" "Trachylepis spilogaster"
## [11] "Trachylepis variegata"
> #define function for calculating mode: https://r-lang.com/mode-in-r/
> getmode <- function(v) {
+ uniqv <- unique(v)
+ uniqv[which.max(tabulate(match(v, uniqv)))]
+ }
>
> #now we create a subset for each species in which the mode of each Tb is adjusted to 0 and represented by column Tbm
> ss1 <- subset(kls, species == "Agama aculeata") %>%
+ mutate(Tbm = Tb - getmode(Tb))
>
> ss2 <- subset(kls, species == "Heliobolus lugubris") %>%
+ mutate(Tbm = Tb - getmode(Tb))
>
> ss3 <- subset(kls, species == "Meroles squamulosus") %>%
+ mutate(Tbm = Tb - getmode(Tb))
>
> ss4 <- subset(kls, species == "Meroles suborbitalis") %>%
+ mutate(Tbm = Tb - getmode(Tb))
>
> ss5 <- subset(kls, species == "Nucras tessellata") %>%
+ mutate(Tbm = Tb - getmode(Tb))
>
> ss6 <- subset(kls, species == "Pedioplanis lineoocellata") %>%
+ mutate(Tbm = Tb - getmode(Tb))
>
> ss7 <- subset(kls, species == "Pedioplanis namaquensis")%>% mutate(Tbm = Tb - getmode(Tb))
>
> ss8 <- subset(kls, species == "Trachylepis occidentalis") %>% mutate(Tbm = Tb - getmode(Tb))
>
> ss9 <- subset(kls, species == "Trachylepis sparsa") %>%
+ mutate(Tbm = Tb - getmode(Tb))
>
> ss10 <- subset(kls, species == "Trachylepis spilogaster") %>% mutate(Tbm = Tb - getmode(Tb))
Now we can run the ggplot. In order to show all the species subsets on the same plot, we add geom_density() with each of the different subsets. I feel there may have been a more stream-lined way to do this, but this did not take me too long either.
> #constructing density plots: http://www.sthda.com/english/wiki/ggplot2-density-plot-quick-start-guide-r-software-and-data-visualization
> s <- ggplot(ss1, aes(x=Tbm)) +
+ geom_density() +
+ geom_density(data = ss2) +
+ geom_density(data = ss3) +
+ geom_density(data = ss4) +
+ geom_density(data = ss5) +
+ geom_density(data = ss6) +
+ geom_density(data = ss7) +
+ geom_density(data = ss8) +
+ geom_density(data = ss9) +
+ geom_density(data = ss10) +
+ #add mean line:
+ geom_vline(aes(xintercept=0),
+ color="black", linetype="dashed") +
+ #classic theme seems most appropriate (removes grid background, adds tick marks, black and white)
+ theme_classic() +
+ #set axes:
+ xlim(-15, 5) + ylim(0, 0.5)+
+ #set tick marks:
+ scale_y_continuous(breaks = c(0.0, 0.1, 0.2))+
+ scale_x_continuous(breaks = c(-15, -10, -5, 0, 5))+
+ #add labels,
+ #code for degrees celsius symbol: https://stackoverflow.com/questions/51799118/writing-the-symbol-degrees-celsius-in-axis-titles-with-r-plotly/51799161
+ labs(title="Density plots Kalahari lizards, summer",x="Body temperature (\u00B0C), centered on modal temperature", y = "Density")
## Scale for 'y' is already present. Adding another scale for 'y', which will
## replace the existing scale.
## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.
> s
We can see from the figure that most Kalahari lizard species body temperature distributions exhibit left skewness. This relationship was also observed in the original article figure 1.
For comparison, we have the origial plot:
Body temperature skewness (jittered) for desert lizards, with values by family (or by subfamily for Agamidae) and by continent. Three families are represented by one species. No pattern (desert, taxon) is evident.
Data was separated into three sets: Africa, Australia, and North America. Below, we load each data set and calculate the statistics for each. I used “kl”, “al”, and “nl” to name the data sets, representing “kalahari lizards”, “australia lizards”, and “north american lizards”. This syntax is retained throughout.
> f <- curl("https://raw.githubusercontent.com/maekell98/nkelley-data-replication-assignment/main/doi_10-5/KalahariTbData.csv")
> kl <- read.csv(f, header = TRUE, sep = ",", stringsAsFactors = FALSE)
> kl['desert']='Africa'
> head(kl)
## area number latitude species Tb time date family desert
## 1 L 15065 -26.38 Agama aculeata 35.2 12:25 11/28/69 Agamdidae Africa
## 2 L 15087 -26.38 Agama aculeata 38.0 16:28 11/28/69 Agamdidae Africa
## 3 K 15092 -25.75 Agama aculeata 37.2 10:57 11/29/69 Agamdidae Africa
## 4 L 15123 -26.38 Agama aculeata 35.6 17:40 11/28/69 Agamdidae Africa
## 5 K 15141 -25.75 Agama aculeata 37.0 12:50 11/30/69 Agamdidae Africa
## 6 K 15144 -25.75 Agama aculeata 38.0 15:30 11/30/69 Agamdidae Africa
> #Basic R script for calculating stats describing Tb distributions:
> klskew <- kl %>%
+ group_by(family) %>%
+ summarise(
+ meanTb = round(mean(Tb, na.rm = TRUE), digits = 2),
+ medTb = median(Tb,na.rm = TRUE),
+ MAD = round(mad(Tb, na.rm = TRUE), digits = 2),
+ minTb = min(Tb, na.rm = TRUE),
+ maxBT = max(Tb, na.rm = TRUE),
+ nBT = n(),
+ SkewCoef = agostino.test(Tb, alternative = "greater")$statistic[1],
+ SkewP = agostino.test(Tb, alternative = "greater")$p.value,
+ desert = unique(desert))
> g <- curl("https://raw.githubusercontent.com/maekell98/nkelley-data-replication-assignment/main/doi_10-5/AustraliaTbData.csv")
> al<- read.csv(g, header = TRUE, sep = ",", stringsAsFactors = FALSE)
> al['desert']='Australia'
> #Here I rename a row because the other two data sets use Tb as the column name for blood temperature but this one for some reason used "BT". Renaming allows for later analysis.
> names(al)[4]<-paste("Tb")
> head(al)
## area number species Tb time date microhabitat
## 1 E 10448 Varanus eremius 41.0 14:35 01/20/66 terrestrial
## 2 <NA> 9828 Ctenophorus reticulatus 31.8 15:15 10/03/66 semi-arboreal
## 3 <NA> 9830 Ctenophorus reticulatus 31.6 15:24 10/03/66 semi-arboreal
## 4 <NA> 9832 Lophognathus longirostris 37.6 11:15 10/04/66 arboreal
## 5 <NA> 9834 Ctenophorus caudicinctus 38.0 14:00 10/05/66 saxicolous
## 6 <NA> 9835 Ctenophorus isolepis 39.2 16:00 10/05/66 terrestrial
## family desert
## 1 Varanidae Australia
## 2 Agamidae Australia
## 3 Agamidae Australia
## 4 Agamidae Australia
## 5 Agamidae Australia
## 6 Agamidae Australia
> alskew <- al %>%
+ group_by(family) %>%
+ summarise(
+ meanTb = round(mean(Tb, na.rm = TRUE), digits = 2),
+ medTb = median(Tb,na.rm = TRUE),
+ MAD = round(mad(Tb, na.rm = TRUE), digits = 2),
+ minTb = min(Tb, na.rm = TRUE),
+ maxBT = max(Tb, na.rm = TRUE),
+ nBT = n(),
+ SkewCoef = agostino.test(Tb, alternative = "greater")$statistic[1],
+ SkewP = agostino.test(Tb, alternative = "greater")$p.value,
+ desert = unique(desert))
> h <- curl("https://raw.githubusercontent.com/maekell98/nkelley-data-replication-assignment/main/doi_10-5/NorthAmericanTbData.csv")
> nl <- read.csv(h, header = TRUE, sep = ",", stringsAsFactors = FALSE)
> nl['desert']='North Amer.'
> head(nl)
## site number latitude NS species Tb time date
## 1 NAN 5169 42.2 north Aspidoscelis tigris 38.2 7:17 07/01/62
## 2 NAN 5170 42.2 north Phrynosoma platyrhinos 32.0 7:20 07/01/62
## 3 NAN 5171 42.2 north Aspidoscelis tigris 38.0 7:29 07/01/62
## 4 NAN 5172 42.2 north Uta stansburiana 36.0 7:34 07/01/62
## 5 NAN 5173 42.2 north Uta stansburiana 33.2 7:43 07/01/62
## 6 NAN 5174 42.2 north Uta stansburiana 35.4 7:44 07/01/62
## family desert
## 1 Teiidae North Amer.
## 2 Phrynosomatidae North Amer.
## 3 Teiidae North Amer.
## 4 Phrynosomatidae North Amer.
## 5 Phrynosomatidae North Amer.
## 6 Phrynosomatidae North Amer.
> nlskew <- nl %>%
+ group_by(family) %>%
+ summarise(
+ meanTb = round(mean(Tb, na.rm = TRUE), digits = 2),
+ medTb = median(Tb,na.rm = TRUE),
+ MAD = round(mad(Tb, na.rm = TRUE), digits = 2),
+ minTb = min(Tb, na.rm = TRUE),
+ maxBT = max(Tb, na.rm = TRUE),
+ nBT = n(),
+ SkewCoef = agostino.test(Tb, alternative = "greater")$statistic[1],
+ SkewP = agostino.test(Tb, alternative = "greater")$p.value,
+ desert = unique(desert))
For analysis of all lizards in the same figure, we combine all the data frames.
> #bind data -- retain all columns but combine columns in common; NA for columns not in common
> #https://www.geeksforgeeks.org/combine-two-dataframes-in-r-with-different-columns/
> ld <- bind_rows(kl, nl, al)
> #oop. Maybe I didn't need this. I actually needed to combine the statistics data frames which I do in the next line.
>
> #combine skew summaries:
> lskew <- bind_rows(nlskew, alskew, klskew)
Now we use a dot plot to show the skewness by lizard family and country of origin.
> #make a dot plot: http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html
> lplot <-
+ ggplot(lskew, aes(x=SkewCoef, y=family, group=desert)) +
+ #specify shape/color of points: http://www.sthda.com/english/wiki/ggplot2-point-shapes
+ geom_point(aes(shape=desert, color=desert), size = 3) +
+ scale_shape_manual(values=c(25, 8, 17))+
+ scale_color_manual(values=c('red', 'blue', 'grey48'))+
+ #frustrated because I couldn't get the Africa triangle to fill. I think it would've worked if all the shapes were fillable, but since two were not, the one that is refuses to fill... I tried to get around this for quite a while, but did not find the solution.
+ scale_fill_manual(values='red', 'blue', 'grey48')+
+ labs(x = "Skewness", y = "Families or subfamilies") +
+ theme_classic() +
+ xlim(-2.0, 0.5) +
+ scale_x_continuous(breaks = c(-2.0, -1.5, -1.0, -0.5, 0.0, 0.5)) +
+ geom_vline(aes(xintercept=0),
+ color="black", linetype="dashed") +
+ xlim(-2.0, 0.5)
## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.
## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.
> lplot
For comparison, we have the original plot:
You can see my plot did not include the phylogenetic comparison. It also contains different families compared to the original plot. I was unable to find the families used in this plot when I was going through the data. I am not sure if they used other outside information to further classify families/subfamilies, but the lack of clarification made it impossible for me to truly replicate their figure.